2 research outputs found
Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison
In this paper, we present a novel neural graph matching approach applied to
document comparison. Document comparison is a common task in the legal and
financial industries. In some cases, the most important differences may be the
addition or omission of words, sentences, clauses, or paragraphs. However, it
is a challenging task without recording or tracing whole edited process. Under
many temporal uncertainties, we explore the potentiality of our approach to
proximate the accurate comparison to make sure which element blocks have a
relation of edition with others. In beginning, we apply a document layout
analysis that combining traditional and modern technics to segment layout in
blocks of various types appropriately. Then we transform this issue to a
problem of layout graph matching with textual awareness. About graph matching,
it is a long-studied problem with a broad range of applications. However,
different from previous works focusing on visual images or structural layout,
we also bring textual features into our model for adapting this domain.
Specifically, based on the electronic document, we introduce an encoder to deal
with the visual presentation decoding from PDF. Additionally, because the
modifications can cause the inconsistency of document layout analysis between
modified documents and the blocks can be merged and split, Sinkhorn divergence
is adopted in our graph neural approach, which tries to overcome both these
issues with many-to-many block matching. We demonstrate this on two categories
of layouts, as follows., legal agreement and scientific articles, collected
from our real-case datasets
MuraNet: Multi-task Floor Plan Recognition with Relation Attention
The recognition of information in floor plan data requires the use of
detection and segmentation models. However, relying on several single-task
models can result in ineffective utilization of relevant information when there
are multiple tasks present simultaneously. To address this challenge, we
introduce MuraNet, an attention-based multi-task model for segmentation and
detection tasks in floor plan data. In MuraNet, we adopt a unified encoder
called MURA as the backbone with two separated branches: an enhanced
segmentation decoder branch and a decoupled detection head branch based on
YOLOX, for segmentation and detection tasks respectively. The architecture of
MuraNet is designed to leverage the fact that walls, doors, and windows usually
constitute the primary structure of a floor plan's architecture. By jointly
training the model on both detection and segmentation tasks, we believe MuraNet
can effectively extract and utilize relevant features for both tasks. Our
experiments on the CubiCasa5k public dataset show that MuraNet improves
convergence speed during training compared to single-task models like U-Net and
YOLOv3. Moreover, we observe improvements in the average AP and IoU in
detection and segmentation tasks, respectively.Our ablation experiments
demonstrate that the attention-based unified backbone of MuraNet achieves
better feature extraction in floor plan recognition tasks, and the use of
decoupled multi-head branches for different tasks further improves model
performance. We believe that our proposed MuraNet model can address the
disadvantages of single-task models and improve the accuracy and efficiency of
floor plan data recognition.Comment: Document Analysis and Recognition - ICDAR 2023 Workshops. ICDAR 2023.
Lecture Notes in Computer Science, vol 14193. Springer, Cha